Goto

Collaborating Authors

 gaussian initialization



Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits

Neural Information Processing Systems

Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase. Our theoretical results hold for both the local and the global observable cases, where the latter was believed to have vanishing gradients even for very shallow circuits. Experimental results verify our theoretical findings in quantum simulation and quantum chemistry.


Reparameterized LLM Training via Orthogonal Equivalence Transformation

Qiu, Zeju, Buchholz, Simon, Xiao, Tim Z., Dax, Maximilian, Schölkopf, Bernhard, Liu, Weiyang

arXiv.org Artificial Intelligence

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.



Global Convergence of Four-Layer Matrix Factorization under Random Initialization

Luo, Minrui, Xu, Weihang, Gao, Xiang, Fazel, Maryam, Du, Simon Shaolei

arXiv.org Artificial Intelligence

Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights. Here F {C,R} as we consider both real and complex matrices in this paper. Following a long line of works (Arora et al., 2019a; Jiang et al., 2023; Y e & Du, 2021; Chou et al., 2024), we aim to understand the dynamics of gradient descent (GD) on this problem: j = 1,. . Work done while Minrui Luo was visiting the University of Washington. While the model representation power is independent of depth N, the deep matrix factorization problem is naturally motivated by the goal of understanding benefits of depth in deep learning (see, e.g., Arora et al. (2019b)).


Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks Spencer Frei and Yuan Cao and Quanquan Gu

Neural Information Processing Systems

Compared with its rapid and widespread adoption, the theoretical understanding of why deep learning works so well has lagged significantly. This is particularly the case in the common setup of an overparameterized network, where the number of parameters in the network greatly exceeds the number of training examples and input dimension.


GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction

Liu, Jianheng, Wan, Yunfei, Wang, Bowen, Zheng, Chunran, Lin, Jiarong, Zhang, Fu

arXiv.org Artificial Intelligence

Digital twins are fundamental to the development of autonomous driving and embodied artificial intelligence. However, achieving high-granularity surface reconstruction and high-fidelity rendering remains a challenge. Gaussian splatting offers efficient photorealistic rendering but struggles with geometric inconsistencies due to fragmented primitives and sparse observational data in robotics applications. Existing regularization methods, which rely on render-derived constraints, often fail in complex environments. Moreover, effectively integrating sparse LiDAR data with Gaussian splatting remains challenging. We propose a unified LiDAR-visual system that synergizes Gaussian splatting with a neural signed distance field. The accurate LiDAR point clouds enable a trained neural signed distance field to offer a manifold geometry field, This motivates us to offer an SDF-based Gaussian initialization for physically grounded primitive placement and a comprehensive geometric regularization for geometrically consistent rendering and reconstruction. Experiments demonstrate superior reconstruction accuracy and rendering quality across diverse trajectories. To benefit the community, the codes will be released at https://github.com/hku-mars/GS-SDF.


Reviews: Initialization of ReLUs for Dynamical Isometry

Neural Information Processing Systems

The response did elaborate on the relationship between the approaches to ReLU initialization considered and the earlier portion of the paper - this should be made clearer in the paper. However, as pointed out by the other reviewers, the structure in the proposed Gaussian submatrix initalization has previously been proposed in Balduzzi et al. [2]. It analyzes how signals are transformed through the layers of a feedforward neural network, assuming weights are initialized from Gaussian distributions. Previous work used a mean-field assumption to study these dynamics, and used the results to identify parameters for the Gaussians to ensure stable propagation of the mean of the signal variance through the layers, a necessary condition for training deep networks. This work considers how the distribution of the initial signal variance is transformed through the layers of the network.


Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits

Neural Information Processing Systems

Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase.